Hierarchical Phrase-Based Translation with Weighted Finite-State Transducers and Shallow-n Grammars

نویسندگان

  • Adrià de Gispert
  • Gonzalo Iglesias
  • Graeme W. Blackwood
  • Eduardo Rodríguez Banga
  • William J. Byrne
چکیده

In this article we describe HiFST, a lattice-based decoder for hierarchical phrase-based translation and alignment. The decoder is implemented with standard Weighted Finite-State Transducer (WFST) operations as an alternative to the well-known cube pruning procedure. We find that the use of WFSTs rather than k-best lists requires less pruning in translation search, resulting in fewer search errors, better parameter optimization, and improved translation performance. The direct generation of translation lattices in the target language can improve subsequent rescoring procedures, yielding further gains when applying long-span language models and Minimum Bayes Risk decoding. We also provide insights as to how to control the size of the search space defined by hierarchical rules. We show that shallow-n grammars, low-level rule catenation, and other search constraints can help to match the power of the translation system to specific language pairs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hierarchical phrase-based translation with weighted finite state transducers

This dissertation is focused in the Statistical Machine Translation field (SMT), particularly in hierarchical phrase-based translation frameworks. We first study and redesign hierarchical models using several filtering techniques. Hierarchical search spaces are based on automatically extracted translation rules. As originally defined they are too big to handle directly without filtering. In thi...

متن کامل

Hierarchical Phrase-Based Translation with Weighted Finite-State Transducers and Shallow-<italic>n</italic> Grammars

In this article we describe HiFST, a lattice-based decoder for hierarchical phrase-based translation and alignment. The decoder is implemented with standard Weighted Finite-State Transducer (WFST) operations as an alternative to the well-known cube pruning procedure. We find that the use of WFSTs rather than k-best lists requires less pruning in translation search, resulting in fewer search err...

متن کامل

Intersecting Hierarchical and Phrase-Based Models of Translation: Formal Aspects and Algorithms

We address the problem of constructing hybrid translation systems by intersecting a Hiero-style hierarchical system with a phrase-based system and present formal techniques for doing so. We model the phrase-based component by introducing a variant of weighted finite-state automata, called σ-automata, provide a self-contained description of a general algorithm for intersecting weighted synchrono...

متن کامل

A phrase-level machine translation approach for disfluency detection using weighted finite state transducers

We propose a novel algorithm to detect disfluency in speech by reformulating the problem as phrase-level statistical machine translation using weighted finite state transducers. We approach the task as translation of noisy speech to clean speech. We simplify our translation framework such that it does not require fertility and alignment models. We tested our model on the Switchboard disfluency-...

متن کامل

Improved Reordering for Shallow-n Grammar based Hierarchical Phrase-based Translation

Shallow-n grammars (de Gispert et al., 2010) were introduced to reduce over-generation in the Hiero translation model (Chiang, 2005) resulting in much faster decoding and restricting reordering to a desired level for specific language pairs. However, Shallow-n grammars require parameters which cannot be directly optimized using minimum error-rate tuning by the decoder. This paper introduces som...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computational Linguistics

دوره 36  شماره 

صفحات  -

تاریخ انتشار 2010